Robust Mean Estimation in High Dimensions: An Outlier-Fraction Agnostic and Efficient Algorithm
نویسندگان
چکیده
The problem of robust mean estimation in high dimensions is studied, which a certain fraction (less than half) the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, formulated as minimization ℓ 0 -‘norm’ an outlier indicator vector , under second moment constraint on datapoints. then relaxed to xmlns:xlink="http://www.w3.org/1999/xlink">p -norm (0 < xmlns:xlink="http://www.w3.org/1999/xlink">p ≤ 1) objective, and it shown that global minima for each these objectives are order-optimal have optimal breakdown point problem. Furthermore, computationally tractable iterative -minimization hard thresholding algorithm proposed outputs estimate population mean. (with ≈ 0.3) does not require prior knowledge outliers, contrast with most existing algorithms, = 1 has near-linear time complexity. Both synthetic real data experiments demonstrate outperforms state-of-the-art methods.
منابع مشابه
Computationally Efficient Robust Sparse Estimation in High Dimensions
Many conventional statistical procedures are extremely sensitive to seemingly minor deviations from modeling assumptions. This problem is exacerbated in modern high-dimensional settings, where the problem dimension can grow with and possibly exceed the sample size. We consider the problem of robust estimation of sparse functionals, and provide a computationally and statistically efficient algor...
متن کاملMONK - Outlier-Robust Mean Embedding Estimation by Median-of-Means
Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical e...
متن کاملRobust Sparse Estimation Tasks in High Dimensions
In this paper we initiate the study of whether or not sparse estimation tasks can be performed efficiently in high dimensions, in the robust setting where an ε-fraction of samples are corrupted adversarially. We study the natural robust version of two classical sparse estimation problems, namely, sparse mean estimation and sparse PCA in the spiked covariance model. For both of these problems, w...
متن کاملOutlier identification in high dimensions
A computationally fast procedure for identifying outliers is presented, that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high dimensional data. This approach requires considerably less computational time than existing methods for ...
متن کاملGlobal High Dimension Outlier Algorithm for Efficient Clustering and Outlier Detection
In this digital era most of the knowledge kinded on the market in digital form. For several years, individuals have command the hypothesis that exploitation phrases for square measure presentation of document and topic ought to perform higher than terms. During this paper we have a tendency to square measure examine and investigate this reality with considering many states of art data processin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2023
ISSN: ['0018-9448', '1557-9654']
DOI: https://doi.org/10.1109/tit.2023.3249197